Interpretable Multi-Head Attention
Interpretable Multi-Head Attention
ls-type:: annotation
hl-page:: 9
hl-color:: yellow
相对于 [[Multi-Head Attention]] 的修改
-
参数角度 #card
-
针对 V 是多头共享参数 share values in each head
ls-type:: annotation
hl-page:: 9
hl-color:: yellow
, -
对 Q 和 K 是多头独立参数
- 每个头使用不同的值,仅凭注意力权重无法表明特定特征的重要性 Given that different values are used in each head, attention weights alone would not be indicative of a particular feature’s importance.
ls-type:: annotation
hl-page:: 9
hl-color:: yellow
- 每个头使用不同的值,仅凭注意力权重无法表明特定特征的重要性 Given that different values are used in each head, attention weights alone would not be indicative of a particular feature’s importance.
-
-
Attention score 使用方式 #card
-
计算多头 attention score 加权后的 V(求平均), employ additive aggregation of all heads
ls-type:: annotation
hl-page:: 9
hl-color:: yellow
, -
原始方法中是 concat
-
#card InterpretableMultiHead 公式
Interpretable Multi-Head Attention
https://blog.xiang578.com/post/logseq/Interpretable Multi-Head Attention.html